{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7dd7c924",
   "metadata": {},
   "source": [
    "# K最近邻算法\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6140c2b",
   "metadata": {},
   "source": [
    "## K最近邻算法的原理"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "788eea3e",
   "metadata": {},
   "source": [
    "- K最近邻算法就是目标数据点在现有数据集中，离哪一部分最近就将其划分为哪一部分。而K字母的含义就是近邻的个数\n",
    "- 在scikit-learn中，K最近邻算法的K值是通过N_neighbors参数来调节的，默认值是5.\n",
    "-  K最近邻算法在回归和分类都可以运用，原理是相同的。当我们使用K最近邻回归计算某个数据点的预测值时，模型会选择离该数据点最近的若干个数据集中的点，并将它们的y值取平均值，并把改平均值作为新数据点的预测值。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "caa82735",
   "metadata": {},
   "source": [
    "\n",
    "### K最近邻算法，在python中的实现"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "c5bde5ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_wine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "2cef7d6a",
   "metadata": {},
   "outputs": [],
   "source": [
    "wine_dataset = load_wine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "2cf179b0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names'])"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wine_dataset.keys() # 了解数据集中有那些属性\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "b85bc986",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(178, 13)"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wine_dataset['data'].shape # 查看数据矩阵信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "9d7a976c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'.. _wine_dataset:\\n\\nWine recognition dataset\\n------------------------\\n\\n**Data Set Characteristics:**\\n\\n    :Number of Instances: 178 (50 in each of three classes)\\n    :Number of Attributes: 13 numeric, predictive attributes and the class\\n    :Attribute Information:\\n \\t\\t- Alcohol\\n \\t\\t- Malic acid\\n \\t\\t- Ash\\n\\t\\t- Alcalinity of ash  \\n \\t\\t- Magnesium\\n\\t\\t- Total phenols\\n \\t\\t- Flavanoids\\n \\t\\t- Nonflavanoid phenols\\n \\t\\t- Proanthocyanins\\n\\t\\t- Color intensity\\n \\t\\t- Hue\\n \\t\\t- OD280/OD315 of diluted wines\\n \\t\\t- Proline\\n\\n    - class:\\n            - class_0\\n            - class_1\\n            - class_2\\n\\t\\t\\n    :Summary Statistics:\\n    \\n    ============================= ==== ===== ======= =====\\n                                   Min   Max   Mean     SD\\n    ============================= ==== ===== ======= =====\\n    Alcohol:                      11.0  14.8    13.0   0.8\\n    Malic Acid:                   0.74  5.80    2.34  1.12\\n    Ash:                          1.36  3.23    2.36  0.27\\n    Alcalinity of Ash:            10.6  30.0    19.5   3.3\\n    Magnesium:                    70.0 162.0    99.7  14.3\\n    Total Phenols:                0.98  3.88    2.29  0.63\\n    Flavanoids:                   0.34  5.08    2.03  1.00\\n    Nonflavanoid Phenols:         0.13  0.66    0.36  0.12\\n    Proanthocyanins:              0.41  3.58    1.59  0.57\\n    Colour Intensity:              1.3  13.0     5.1   2.3\\n    Hue:                          0.48  1.71    0.96  0.23\\n    OD280/OD315 of diluted wines: 1.27  4.00    2.61  0.71\\n    Proline:                       278  1680     746   315\\n    ============================= ==== ===== ======= =====\\n\\n    :Missing Attribute Values: None\\n    :Class Distribution: class_0 (59), class_1 (71), class_2 (48)\\n    :Creator: R.A. Fisher\\n    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\\n    :Date: July, 1988\\n\\nThis is a copy of UCI ML Wine recognition datasets.\\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data\\n\\nThe data is the results of a chemical analysis of wines grown in the same\\nregion in Italy by three different cultivators. There are thirteen different\\nmeasurements taken for different constituents found in the three types of\\nwine.\\n\\nOriginal Owners: \\n\\nForina, M. et al, PARVUS - \\nAn Extendible Package for Data Exploration, Classification and Correlation. \\nInstitute of Pharmaceutical and Food Analysis and Technologies,\\nVia Brigata Salerno, 16147 Genoa, Italy.\\n\\nCitation:\\n\\nLichman, M. (2013). UCI Machine Learning Repository\\n[https://archive.ics.uci.edu/ml]. Irvine, CA: University of California,\\nSchool of Information and Computer Science. \\n\\n.. topic:: References\\n\\n  (1) S. Aeberhard, D. Coomans and O. de Vel, \\n  Comparison of Classifiers in High Dimensional Settings, \\n  Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of  \\n  Mathematics and Statistics, James Cook University of North Queensland. \\n  (Also submitted to Technometrics). \\n\\n  The data was used with many others for comparing various \\n  classifiers. The classes are separable, though only RDA \\n  has achieved 100% correct classification. \\n  (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) \\n  (All results using the leave-one-out technique) \\n\\n  (2) S. Aeberhard, D. Coomans and O. de Vel, \\n  \"THE CLASSIFICATION PERFORMANCE OF RDA\" \\n  Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of \\n  Mathematics and Statistics, James Cook University of North Queensland. \\n  (Also submitted to Journal of Chemometrics).\\n'"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wine_dataset['DESCR'] # 查看数据集的基本描述"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "fe5157ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入数据集拆分工具,把他们分成训练集与测试集\n",
    "from sklearn.model_selection import train_test_split\n",
    "X_train,X_test,y_train,y_test = train_test_split(\n",
    "wine_dataset['data'],wine_dataset['target'],random_state=0\n",
    ") # random_state生成伪随机数以他为依据拆分。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "2d0304b4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(178, 13) (133, 13) (45, 13)\n"
     ]
    }
   ],
   "source": [
    "print(wine_dataset['data'].shape,X_train.shape,X_test.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "c21ef494",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入KNN分类模型\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "# 指定k值为1\n",
    "knn = KNeighborsClassifier(n_neighbors=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "60f4f7ef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "KNeighborsClassifier(n_neighbors=1)"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 用训练集拟合模型\n",
    "knn.fit(X_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "ea185215",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "KNeighborsClassifier(n_neighbors=1)\n"
     ]
    }
   ],
   "source": [
    "print(knn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "76d04788",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.7555555555555555\n"
     ]
    }
   ],
   "source": [
    "# 对模型的预测进行打分\n",
    "print(knn.score(X_test,y_test))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "51d8dd9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 假如我们获得一个新的数据\n",
    "X_new = np.array(\n",
    "    [[13.2,2.77,2.51,18.5,96.6,1.04,2.55,0.57,1.47,6.2,1.05,3.33,820]]\n",
    ")\n",
    "result = knn.predict(X_new)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "e6af61db",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'class_2'"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wine_dataset['target_names'][int(result)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d7f3b63",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
